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H ■ Abstract 
03 

The method of potential solutions of Fokker-Planck equations is used to develop a transport equation for the 
joint probability of TV coupled stochastic variables with the Dirichlet distribution as its asymptotic solution. 
To ensure a bounded sample space, a coupled nonlinear diffusion process is required: the Wiener-processes 
in the equivalent system of stochastic differential equations are multiplicative with coefficients dependent 
on all the stochastic variables. Individual samples of a discrete ensemble, obtained from the stochastic 
Q-ij process, satisfy a unit-sum constraint at all times. The process may be used to represent realizations of a 

^ (— i fluctuating ensemble of N variables subject to a conservation principle. Similar to the multivariate Wright- 

Fisher process, whose invariant is also Dirichlet, the univariate case yields a process whose invariant is the 
beta distribution. As a test of the results, Monte-Carlo simulations are used to evolve numerical ensembles 
toward the invariant Dirichlet distribution. 
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1 Objective 

(N : 

■ We develop a Fokker-Planck equation whose statistically stationary solution is the Dirichlet distri- 

ff"} | bution [1, 2, 3]. The system of stochastic differential equations (SDE), equivalent to the Fokker- 

Planck equation, yields a Markov process that allows a Monte-Carlo method to numerically evolve 
an ensemble of fluctuating variables that satisfy a unit-sum requirement. A Monte Carlo solution 
is used to verify that the invariant distribution is Dirichlet. 

The Dirichlet distribution is a statistical representation of non-negative variables subject to a 
unit-sum requirement. The properties of such variables have been of interest in a variety of fields, 
including evolutionary theory [4], Bayesian statistics [5], geology [6, 7], forensics [8], econometrics 
[9], turbulent combustion [10], and population biology [11]. 

2 Preview of results 

The Dirichlet distribution [1, 2, 3] for a set of scalars < Y a , a = 1, . . . , N — 1, Yl a =i < 1> is 
given by 

r ( J2 N = u ) N 

lla=l 1 \Ua) a= x 

where uj a >0 are parameters, Yn = 1 — J2^=i ^/3> an d rQ denotes the gamma function. We derive 
the stochastic diffusion process, governing the scalars, Y a , 



dY a (t) = -f [S a Y N - (1 - S a )Y a ]dt + y/ Ka Y a Y N dW a (t), a = l,...,N-l, (2) 
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3 Development of the diffusion process 



where dW a (t) is an isotropic vector- valued Wiener process [12] , and b a > 0, n a > 0, and < S a < 1 are 
coefficients. We show that the statistically stationary solution of Eq. (2) is the Dirichlet distribution, 
Eq. (1), provided the SDE coefficients satisfy 

^(l-Si) = --- = ^(l-^-i). (3) 

Ki KN-1 

The restrictions imposed on the SDE coefficients, b a , K a , and S a , ensure reflection towards the 
interior of the sample space, which is a generalized triangle or tetrahedron (more precisely, a 
simplex) in N — 1 dimensions. The restrictions together with the specification of the -/V th scalar as 
Y N = 1- J2p=i Y P ensure 

N 

E y * = L ( 4 ) 

a=l 

Indeed, inspection of Eq. (2) shows that for example when Y\ = 0, the diffusion is zero and the drift 
is strictly positive, while if Y\ = l, the diffusion is zero (Yat = 0) and the drift is strictly negative. 

3 Development of the diffusion process 

The diffusion process (2) is developed by the method of potential solutions. 
We start from the Ito diffusion process [12] for the stochastic vector, Y a , 

dY a (t) = a a (Y)dt + b a p(Y)dW(3(t), a,(3 = l,...,N-l, (5) 

with drift, a a (Y), diffusion, 6 a/ g(Y), and the isotropic vector- valued Wiener process, dWp(t), where 
summation is implied for repeated indices. Using standard methods given in [12] the equivalent 
Fokker-Planck equation governing the joint probability, J^(Y, t), derived from Eq. (5), is 

with diffusion B a p = ba^byp. Since the drift and diffusion coefficients are time-homogeneous, 
a a (Y,t) = a a (Y) and B a @(Y,t) = B a p(Y), Eq. (5) is a statistically stationary process and the 
solution of Eq. (6) converges to a stationary distribution, [12] Sec. 6.2.2. Our task is to specify the 
functional forms of a a (Y) and b a p(Y) so that the stationary solution of Eq. (6) is @(Y), defined 
by Eq. (1). 

A potential solution of Eq. (6) exists if 

-Wr = B ap\^-^)=- Wp > «,/3, 7 = l,...,iV-l, (7) 

is satisfied, [12] Sec. 6.2.2. Since the left hand side of Eq. (7) is a gradient, the expression on 
the right must also be a gradient and can therefore be obtained from a scalar potential denoted 
by <j>(Y). This puts a constraint on the possible choices of a a and B a R and on the potential, as 
(f>, a p = 4>,p a must also be satisfied. The potential solution is 

jr(Y) = expHKY)]. (8) 

Now functional forms of a a (Y) and B a p(Y) that satisfy Eq. (7) with J£"(Y) = 3i(Y) are sought. 
The mathematical constraints on the specification of a a and B a p are as follows: 

1. B a R must be symmetric positive semi-definite. This is to ensure that 
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• the square-root of B a p (e.g. the Cholesky-decomposition, b a p) exists, required by the 
correspondence of the SDE (5) and the Fokker-Planck equation (6), 

• Eq. (5) represents a diffusion, and 

• det(B a p) ^ 0, required by the existence of the inverse in Eq. (7). 
2. For a potential solution to exist Eq. (7) must be satisfied. 

With J?(Y) = @(Y) Eq. (8) shows that the scalar potential must be 



N 



(Y) = -53(w a -l)lnr a . 



a=l 



It is straightforward to verify that the specifications 



(Y) = -?-[S a Y N -(l-S a )Y a ], 



B a p(Y) 



^oXoXn for a = (3, 
for a 7^ f3, 



(9) 

(10) 
(11) 



satisfy the above mathematical constraints, 1. and 2. Here b a > 0, K a > 0, < S a < 1, and 
Yv = l — Yl^=i Summation is not implied for Eqs. (9-11). 

Substituting Eqs. (9-11) into Eq. (7) yields a system with the same functions on both sides 
with different coefficients, yielding the correspondence between the iV coefficients of the Dirichlet 
distribution, Eq. (1), and the Fokker-Planck equation (6) with Eqs. (10-11) as 



LO a — S a , 
K a 

uj n = —(1 - Si) 

Ki 



a = 1,...,N- 1, 
b N 



KN-l 



-(1-Sa^i). 



(12) 
(13) 



For example, for TV = 3 one has Y = (Y\, Y 2 , Y = 1 — Yi — Y2) and from Eq. (9) the scalar potential 
is 



- 0(Yi, Y 2 ) = (cji - 1) In Yy + ( W2 - 1) In Y 2 + {cj 3 - 1) ln(l - Yi - Y 2 ). 
Eq. (7) then becomes the system 



U\ — 1 U! 3 — 1 

u 2 — 1 U3 - 1 



Y, 



Y 



Kl ) Yi 

^2 / Y 2 



Kl 

K-2 



-(l-5l)-l 



(1 - 5 2 ) - 1 



1 

Y 3 ' 
1 



which shows that by specifying the parameters, w a , of the Dirichlet distribution as 

ui = — Si, 

Kl 

h G 
w 2 — — o 2 , 

Wg= k(l-Si) = — (l-ft), 
Kl K 2 



(14) 

(15) 
(16) 

(17) 
(18) 
(19) 



the stationary solution of the Fokker-Planck equation (6) with drift (10) and diffusion (11) is 
Si(Y,u) for iV = 3. The above development generalizes to N variables, yielding Eqs. (12-13), and 



4 Corroborating that the invariant distribution is Dirichlet 



reduces to the beta distribution, a univariate specialization of 3l for N = 2, where Y\ = Y and 
Y 2 = l-Y, see [13]. 

If Eqs. (12-13) hold, the stationary solution of the Fokker-Planck equation (6) with drift (10) 
and diffusion (11) is the Dirichlet distribution, Eq. (1). Note that Eqs. (10-11) are one possible way 
of specifying a drift and a diffusion to arrive at a Dirichlet distribution; other functional forms may 
be possible. The specifications in Eqs. (10-11) are a generalization of the results for a univariate 
diffusion process, discussed in [13, 14], whose invariant distribution is beta. 

The shape of the Dirichlet distribution, Eq. (1), is determined by the iV coefficients, u a . Eqs. 
(12-13) show that in the stochastic system, different combinations of b a , S a , and K a may yield the 
same oo a and that not all of b a , S a , and K a may be chosen independently to keep the invariant 
Dirichlet. 

4 Corroborating that the invariant distribution is Dirichlet 

For any multivariate Fokker-Planck equation there is an equivalent system of ltd diffusion processes, 
such as the pair of Eqs. (5-6) [12]. Therefore, a way of computing the (discrete) numerical solution 
of Eq. (6) is to integrate Eq. (5) in a Monte-Carlo fashion for an ensemble [15]. Using a Monte- 
Carlo simulation we show that the statistically stationary solution of the Fokker-Planck equation 
(6) with drift and diffusion (10-11) is a Dirichlet distribution, Eq. (1). 

The time-evolution of an ensemble of particles, each with N = 3 variables (Yi,Y 2 ,Y^), is nu- 
merically computed by integrating the system of equations (5), with drift and diffusion (10-11), for 
N = 3 as 



dY}' 


_ & 1 

2 


~SiY? 


- (i - s 1 )y} { ^' 


dt+y 




*3 




(20) 




_ fe 2 
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S 2 Y? 


- (1 - S 2 )Yf 


dt + y 


1^ 


)v(t 


W 2 (i) 


(21) 


V (i) 


= 1 - 




vW 
J 2 










(22) 



for each particle i. In Eqs. (20-21) dW\ and dW 2 are independent Wiener processes, sampled from 
Gaussian streams of random numbers with mean (dW a ) = and covariance (dW^dW^) = 5 a pdt. 
400,000 particle-triplets, (Y\, Y 2 , 13), are generated with two different initial distributions, displayed 
in the upper-left of Figures 1 and 2, a triple-delta and a box, respectively. Each member of both 
initial ensembles satisfy Y2a=i ^ a = Eqs. (20-22) are advanced in time with the Euler-Maruyama 
scheme [16] with time step At = 0.05. Table 1 shows the coefficients of the stochastic system (20- 
22), the corresponding parameters of the final Dirichlet distribution, and the first two moments 
at the initial times for the triple-delta initial condition case. The final state of the ensembles are 
determined by the SDE coefficients, constant for these exercises, also given in Table 1, the same 
for both simulations, satisfying Eq. (19). 

The time-evolutions of the joint probabilities are extracted from both calculations and displayed 
at different times in Figures 1 and 2. At the end of the simulations two distributions are plotted at 
the bottom-right of both figures: the one extracted from the numerical ensemble and the Dirichlet 
distribution determined analytically using the SDE coefficients - in excellent agreement in both 
figures. The statistically stationary solution of the developed stochastic system is the Dirichlet 
distribution. 

For a more quantitative evaluation, the time evolution of the first two moments, fi a = (Y a ) = 
Jo fo Ya<^(Yi, Y 2 )dYidY 2 , and (y a y/3} = ((Y a — (Y a ))(Yp — (Yp))), are also extracted from the nu- 
merical simulation with the triple-delta-peak initial condition as ensemble averages and displayed 
in Figures 3 and 4. The figures show that the statistics converge to the precise state given by the 
Dirichlet distribution that is prescribed by the SDE coefficients, see Table 1. 
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The solution approaches a Dirichlet distribution, with non-positive covariances [2], in the sta- 
tistically stationary limit, Figure 4(b). Note that during the evolution of the process, 0<t <80, the 
solution is not necessarily Dirichlet, but the stochastic variables sum to one at all times. The point 
(Yi, Y2), governed by Eqs. (20-21), can never leave the (N — l)-dimensional (here iV = 3) convex 
polytope and by definition I3 = 1 — Y\ — Yi . The rate at which the numerical solution converges to 
a Dirichlet distribution is determined by the vectors b a and n a . 

The above numerical results confirm that starting from arbitrary realizable ensembles the so- 
lution of the stochastic system converges to a Dirichlet distribution in the statistically stationary 
state, specified by the SDE coefficients. 

5 Relation to other diffusion processes 

It is useful to relate the Dirichlet diffusion process, Eq. (2), to other multivariate stochastic diffusion 
processes with linear drift and quadratic diffusion. 

A close relative of Eq. (2) is the multivariate Wright-Fisher (WF) process [11], used extensively 
in population and genetic biology, 

dY a {t) = -K- ooY a )dt + ^Y Q (5 a p-Yp)dW aP (t), a = 1, . . . ,N - 1, (23) 

0=1 

where 5 a /s is Kronecker's delta, co = Ylp=i U P with oj a defined in Eq. (1) and, Yn = 1 — J2p=i Yp- 
Similarly to Eq. (2), the statistically stationary solution of Eq. (23) is the Dirichlet distribution 
[17]. It is straightforward to verify that its drift and diffusion also satisfy Eq. (7) with & = 
i.e. WF is a process whose invariant is Dirichlet and this solution is potential. A notable difference 
between Eqs. (2) and (23), other than the coefficients, is that the diffusion matrix of the Dirichlet 
diffusion process is diagonal, while that of the WF process it is full. 

Another process similar to Eqs. (2) and (23) is the multivariate Jacobi process, used in econo- 
metrics, 

JV-l 

dY a (t) = a(Y a -ir a )dt + y^dW a (t)-YY a ^cYg~dWp(t), a = l,...,N (24) 

p=i 

of Gourieroux & Jasiak [9] with a < 0, c > 0, 7r Q > 0, and X^flLi = 1- 
In the univariate case the Dirichlet, WF, and Jacobi diffusions reduce to 

dY(t) = \{S~ Y)dt + y/ K Y(l - Y)dW(t), (25) 

see also [13], whose invariant is the beta distribution, which belongs to the family of Pearson 
diffusions, discussed in detail by Forman & Sorensen [14]. 

6 Summary 

The method of potential solutions of Fokker-Planck equations has been used to derive a transport 
equation for the joint distribution of N fluctuating variables. The equivalent stochastic process, 
governing the set of random variables, 0<Y a , a = l, . . . , N — 1, Yla=i < 1> reads 

dY a (t) = ^ [S a Y N - (1 - S a )Y a ]dt + y/ Ka Y a Y N dW a {t), a = l,...,N-l, (26) 

where Yn = 1 — J2p=i , and b a , K a and S a are parameters, while dW a (t) is an isotropic Wiener 
process with independent increments. Restricting the coefficients to b a > 0, n a > and < S a < 1, 
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6 Summary 



and denning Yn as above ensure ^ Q =i = 1 an d that individual realizations of (Y±, Y2, ■ ■ ■ , IV) 
are confined to the (A^— l)-dimensional convex polytope of the sample space. Eq. (26) can therefore 
be used to numerically evolve the joint distribution of N fluctuating variables required to satisfy a 
conservation principle. Eq. (26) is a coupled system of nonlinear stochastic differential equations 
whose statistically stationary solution is the Dirichlet distribution, Eq. (1), provided the coefficients 
satisfy 



In stochastic modeling, one typically begins with a physical problem, perhaps discrete, then 
derives the stochastic differential equations whose solution yields a distribution. In this paper 
we reversed the process: we assumed a desired stationary distribution and derived the stochastic 
differential equations that converge to the assumed distribution. A potential solution form of the 
Fokker-Planck equation was posited, from which we obtained the stochastic differential equations 
for the diffusion process whose statistically stationary solution is the Dirichlet distribution. We 
have also made connections to other stochastic processes, such as the Wright-Fisher diffusions of 
population biology and the Jacobi diffusions in econometrics, whose invariant distributions possess 
similar properties but whose stochastic differential equations are different. 
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Fig. 1: Time evolution of the joint probability, J^~(ii, Y2), extracted from the numerical solution of 
Eqs. (20-22). The initial condition is a triple-delta distribution, with unequal peaks at the 
three corners of the sample space. At the end of the simulation, t = 140, the solid lines are 
that of the distribution extracted from the numerical ensemble, dashed lines are that of a 
Dirichlet distribution to which the solution converges in the statistically stationary state, 
implied by the constant SDE coefficients, sampled at the same heights. 
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Fig. 2: Time evolution of the joint probability, Y-j)? extracted from the numerical solution of 

Eqs. (20-22). The top-left panel shows the initial condition: a box with diffused sides. By 
t = 160, bottom-right panel, the distribution converges to the same Dirichlet distribution 
as in Figure 1. 
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Fig. 3: Time evolution of the means extracted from the numerically integrated system of Eqs. (20- 
22) starting from the triple-delta initial condition. Dotted-solid lines - numerical solution, 
dashed lines - statistics of the Dirichlet distribution determined analytically using the con- 
stant coefficients of the SDE, see Table 1. 




Fig. 4: Time evolution of the second central moments extracted from the numerically integrated 
system of Eqs. (20-22) starting from the triple-delta initial condition. The legend is the 
same as in Figure 3. 
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Tab. 1: Initial and final states of the Monte-Carlo simulation starting from a triple-delta. The 
coefficients, b±, />2> Si, S 2 , Ki, K2, of the system of SDEs (20-22) determine the distribution 
to which the system converges. The Dirichlet parameters, implied by the SDE coefficients 
via Eqs. (17-19), are in brackets. The corresponding statistics are determined by the well- 
known formulae of Dirichlet distributions [2] . 



Initial state: triple-delta, SDE coefficients and the statistics of their implied Dirichlet 
see Figure 1 distribution in the stationary state 
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